Meteor, M-BLEU and M-TER: Evaluation Metrics for High-Correlation with Human Rankings of Machine Translation Output
نویسندگان
چکیده
This paper describes our submissions to the machine translation evaluation shared task in ACL WMT-08. Our primary submission is the Meteor metric tuned for optimizing correlation with human rankings of translation hypotheses. We show significant improvement in correlation as compared to the earlier version of metric which was tuned to optimized correlation with traditional adequacy and fluency judgments. We also describe m-bleu and m-ter, enhanced versions of two other widely used metrics bleu and ter respectively, which extend the exact word matching used in these metrics with the flexible matching based on stemming and Wordnet in Meteor .
منابع مشابه
Meteor, m-bleu and m-ter: Flexible Matching and Parameter Tuning for High-Correlation with Human Judgments of Machine Translation Quality
We describe our submission to the NIST Metrics for Machine Translation Challenge consisting of 4 metrics two versions of meteor, m-bleu and m-ter. We first give a brief description of Meteor . That is followed by descriptino of m-bleu and m-ter, enhanced versions of two other widely used metrics bleu and ter, which extend the exact word matching used in these metrics with the flexible matching ...
متن کاملThe Best Lexical Metric for Phrase-Based Statistical MT System Optimization
Translation systems are generally trained to optimize BLEU, but many alternative metrics are available. We explore how optimizing toward various automatic evaluation metrics (BLEU, METEOR, NIST, TER) affects the resulting model. We train a state-of-the-art MT system using MERT on many parameterizations of each metric and evaluate the resulting models on the other metrics and also using human ju...
متن کاملMETEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments
Meteor is an automatic metric for Machine Translation evaluation which has been demonstrated to have high levels of correlation with human judgments of translation quality, significantly outperforming the more commonly used Bleu metric. It is one of several automatic metrics used in this year’s shared task within the ACL WMT-07 workshop. This paper recaps the technical details underlying the me...
متن کاملThe Prague Bulletin of Mathematical Linguistics NUMBER ? ? ? JULY 2011 1 – 8 Metric combination for the Machine Translation optimisation tool MERT
The main metric used for SMT systems evaluation an optimisation is BLEU score but this metric is questioned about its relevance to human evaluation. Some other metrics already exist but none of them are in perfect harmony with human evaluation. On the other hand, most evaluations use multiple metrics (BLEU, TER, METEOR, etc.). Systems can optimise toward other metrics than BLEU. But optimisatio...
متن کاملRobust Machine Translation Evaluation with Entailment Features
Existing evaluation metrics for machine translation lack crucial robustness: their correlations with human quality judgments vary considerably across languages and genres. We believe that the main reason is their inability to properly capture meaning: A good translation candidate means the same thing as the reference translation, regardless of formulation. We propose a metric that evaluates MT ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008